skip to main content


Search for: All records

Creators/Authors contains: "Zou, Cliff"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The Windows registry stores a glut of information containing settings and data utilized by the Microsoft operating system (OS) and other applications. For example, information such as user credentials, installed programs, recently used applications and documents, accessed resources such as local, remote, and removable devices can all be found in this database. More revealingly, the registry also has time and date stamps that can help build a timeline of user activities. The Windows registry can be easily queried by either malicious or benign applications. This is possible through the Windows Application Program Interface (API) and other OS built-in utilities. In this paper, we develop and demonstrate a program able to collect and infer a user’s rich activities by accessing the Windows registry alone. This information, also referred to as the user’s digital footprint, can be used to devise an exploit or create a privacy threat. Our custom developed application will demonstrate how a user’s digital footprint can be acquired by a malicious application from a Windows registry, without alerting security software. In addition, this information can be exported to a set of comma delimited files, making it easy to import them into other analysis applications. 
    more » « less
  2. Despite encryption, the packet size is still visible, enabling observers to infer private information in the Internet of Things (IoT) environment (e.g., IoT device identification). Packet padding obfuscates packet-length characteristics with a high data overhead because it relies on adding noise to the data. This paper proposes a more data-efficient approach that randomizes packet sizes without adding noise. We achieve this by splitting large TCP segments into random-sized chunks; hence, the packet length distribution is obfuscated without adding noise data. Our client–server implementation using TCP sockets demonstrates the feasibility of our approach at the application level. We realize our packet size control by adjusting two local socket-programming parameters. First, we enable the TCP_NODELAY option to send out each packet with our specified length. Second, we downsize the sending buffer to prevent the sender from pushing out more data than can be received, which could disable our control of the packet sizes. We simulate our defense on a network trace of four IoT devices and show a reduction in device classification accuracy from 98% to 63%, close to random guessing. Meanwhile, the real-world data transmission experiments show that the added latency is reasonable, less than 21%, while the added packet header overhead is only about 5%.

     
    more » « less
    Free, publicly-accessible full text available September 1, 2024
  3. Deep learning models have been used in creating various effective image classification applications. However, they are vulnerable to adversarial attacks that seek to misguide the models into predicting incorrect classes. Our study of major adversarial attack models shows that they all specifically target and exploit the neural networking structures in their designs. This understanding led us to develop a hypothesis that most classical machine learning models, such as random forest (RF), are immune to adversarial attack models because they do not rely on neural network design at all. Our experimental study of classical machine learning models against popular adversarial attacks supports this hypothesis. Based on this hypothesis, we propose a new adversarial-aware deep learning system by using a classical machine learning model as the secondary verification system to complement the primary deep learning model in image classification. Although the secondary classical machine learning model has less accurate output, it is only used for verification purposes, which does not impact the output accuracy of the primary deep learning model, and, at the same time, can effectively detect an adversarial attack when a clear mismatch occurs. Our experiments based on the CIFAR-100 dataset show that our proposed approach outperforms current state-of-the-art adversarial defense systems. 
    more » « less
    Free, publicly-accessible full text available July 1, 2024
  4. There is an increasing demand for processing large volumes of unstructured data for a wide variety of applications. However, protection measures for these big data sets are still in their infancy, which could lead to significant security and privacy issues. Attribute-based access control (ABAC) provides a dynamic and flexible solution that is effective for mediating access. We analyzed and implemented a prototype application of ABAC to large dataset processing in Amazon Web Services, using open-source versions of Apache Hadoop, Ranger, and Atlas. The Hadoop ecosystem is one of the most popular frameworks for large dataset processing and storage and is adopted by major cloud service providers. We conducted a rigorous analysis of cybersecurity in implementing ABAC policies in Hadoop, including developing a synthetic dataset of information at multiple sensitivity levels that realistically represents healthcare and connected social media data. We then developed Apache Spark programs that extract, connect, and transform data in a manner representative of a realistic use case. Our result is a framework for securing big data. Applying this framework ensures that serious cybersecurity concerns are addressed. We provide details of our analysis and experimentation code in a GitHub repository for further research by the community.

     
    more » « less
  5. In many VoIP systems, Voice Activity Detection (VAD) is often used on VoIP traffic to suppress packets of silence in order to reduce the bandwidth consumption of phone calls. Unfortunately, although VoIP traffic is fully encrypted and secured, traffic analysis of this suppression can reveal identifying information about calls made to customer service automated phone systems. Because different customer service phone systems have distinct, but fixed (pre-recorded) automated voice messages sent to customers, VAD silence suppression used in VoIP will enable an eavesdropper to profile and identify these automated voice messages. In this paper, we will use a popular enterprise VoIP system (Cisco CallManager), running the default Session Initiation Protocol (SIP) protocol, to demonstrate that an attacker can reliably use the silence suppression to profile calls to such VoIP systems. Our real-world experiments demonstrate that this side-channel profiling attack can be used to accurately identify not only what customer service phone number a customer calls, but also what following options are subsequently chosen by the caller in the phone conversation. 
    more » « less
  6. Social media nowadays has a direct impact on people's daily lives as many edge devices are available at our disposal and controlled by our fingertips. With such advancement in communication technology comes a rapid increase of disinformation in many kinds and shapes; faked images are one of the primary examples of misinformation media that can affect many users. Such activity can severely impact public behavior, attitude, and belief or sway the viewers' perception in any malicious or benign direction. Mitigating such disinformation over the Internet is becoming an issue with increasing interest from many aspects of our society, and effective authentication for detecting manipulated images has become extremely important. Perceptual hashing (pHash) is one of the effective techniques for detecting image manipulations. This paper develops a new and a robust pHash authentication approach to detect fake imagery on social media networks, choosing Facebook and Twitter as case studies. Our proposed pHash utilizes a self-supervised learning framework and contrastive loss. In addition, we develop a fake image sample generator in the pre-processing stage to cover the three most known image attacks (copy-move, splicing, and removal). The proposed authentication technique outperforms state-of-the-art pHash methods based on the SMPI dataset and other similar datasets that target one or more image attacks types. 
    more » « less
  7. Message Queuing Telemetry Transport (MQTT) is a popular communication protocol used to interconnect devices with considerable network restraints, such as those found in Internet of Things (IoT). MQTT directly impacts a large number of devices, but the software security of its server ("broker") implementations is not well studied. In this paper, we design, implement, and evaluate a novel fuzz testing model for MQTT. The fuzzer combines aspects of mutation guided fuzzing and generation guided fuzzing to rigorously exhaust the MQTT protocol and identify vulnerabilities in servers. We introduce Markov chains for mutation guided fuzzing and generation guided fuzzing that model the fuzzing engine according to a finite Bernoulli process. We implement "response feedback", a novel technique which monitors network and console activity to learn which inputs trigger new responses from the broker. In total, we found 7 major vulnerabilities across 9 different MQTT implementations, including 6 zero-day vulnerabilities and 2 CVEs. We show that when fuzzing these popular MQTT targets, our fuzzer compares favorably with other state-of-the-art fuzzing frameworks, such as BooFuzz and AFLNet. 
    more » « less
  8. Recent research has shown that in-network observers of WiFi communication (i.e., observers who have joined the WiFi network) can obtain much information regarding the types, user identities, and activities of Internet-of-Things (IoT) devices in the network. What has not been explored is the question of how much information can be inferred by an out-of-network observer who does not have access to the WiFi network. This attack scenario is more realistic and much harder to defend against, thus imposes a real threat to user privacy. In this paper, we investigate privacy leakage derived from an out-of-network traffic eavesdropper on the encrypted WiFi traffic of popular IoT devices. We instrumented a testbed of 12 popular IoT devices and evaluated multiple machine learning methods for fingerprinting and inferring what IoT devices exist in a WiFi network. By only exploiting the WiFi frame header information, we have achieved 95% accuracy in identifying the devices and often their working status. This study demonstrates that information leakage and privacy attack is a real threat for WiFi networks and IoT applications. 
    more » « less